A popular open source toolkit that implements linear regression is demonstrated in this optional lab. Open source machine learning library Scikit-Learn is used by many practitioners in many of the world’s leading AI, internet, and machine learning companies.
You may use tools like Scikit Learn to train your models if you are currently or in the future using machine learning in your job.
This is my learning experience of data science through DeepLearning.AI. These repository contributions are part of my learning journey through my graduate program masters of applied data sciences (MADS) at University Of Michigan, DeepLearning.AI, Coursera & DataCamp. You can find my similar articles & more stories at my medium & LinkedIn profile. I am available at kaggle & github blogs & github repos. Thank you for your motivation, support & valuable feedback.
These include projects, coursework & notebook which I learned through my data science journey. They are created for reproducible & future reference purpose only. All source code, slides or screenshot are intellectual property of respective content authors. If you find these contents beneficial, kindly consider learning subscription from DeepLearning.AI Subscription, Coursera, DataCamp
Optional Lab: Linear Regression using Scikit-Learn
There is an open-source, commercially usable machine learning toolkit called scikit-learn. This toolkit contains implementations of many of the algorithms that you will work with in this course.
Goals
In this lab you will: - Utilize scikit-learn to implement linear regression using Gradient Descent
Tools
You will utilize functions from scikit-learn as well as matplotlib and NumPy.
Code
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import SGDRegressorfrom sklearn.preprocessing import StandardScalerfrom lab_utils_multi import load_house_datafrom lab_utils_common import dlcnp.set_printoptions(precision=2)plt.style.use('deeplearning.mplstyle')
Gradient Descent
Scikit-learn has a gradient descent regression model sklearn.linear_model.SGDRegressor. Like your previous implementation of gradient descent, this model performs best with normalized inputs. sklearn.preprocessing.StandardScaler will perform z-score normalization as in a previous lab. Here it is referred to as ‘standard score’.
scaler = StandardScaler()X_norm = scaler.fit_transform(X_train)print(f"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}")print(f"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}")
Peak to Peak range by column in Raw X:[2.41e+03 4.00e+00 1.00e+00 9.50e+01]
Peak to Peak range by column in Normalized X:[5.85 6.14 2.06 3.69]
Create and fit the regression model
Code
sgdr = SGDRegressor(max_iter=1000)sgdr.fit(X_norm, y_train)print(sgdr)print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")
SGDRegressor()
number of iterations completed: 108, number of weight updates: 10693.0
View parameters
Note, the parameters are associated with the normalized input data. The fit parameters are very close to those found in the previous lab with this data.
model parameters: w: [109.82 -20.91 -32.29 -38.11], b:[363.18]
model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16
Make predictions
Predict the targets of the training data. Use both the predict routine and compute using \(w\) and \(b\).
Code
# make a prediction using sgdr.predict()y_pred_sgd = sgdr.predict(X_norm)# make a prediction using w,b.y_pred = np.dot(X_norm, w_norm) + b_normprint(f"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}")print(f"Prediction on training set:\n{y_pred[:4]}" )print(f"Target values \n{y_train[:4]}")
prediction using np.dot() and sgdr.predict match: True
Prediction on training set:
[295.2 485.84 389.65 492. ]
Target values
[300. 509.8 394. 540. ]
Plot Results
Let’s plot the predictions versus the target values.
Code
# plot predictions and targets vs original featuresfig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)for i inrange(len(ax)): ax[i].scatter(X_train[:,i],y_train, label ='target') ax[i].set_xlabel(X_features[i]) ax[i].scatter(X_train[:,i],y_pred,color=dlc["dlorange"], label ='predict')ax[0].set_ylabel("Price"); ax[0].legend();fig.suptitle("target versus prediction using z-score normalized model")plt.show()
Congratulations!
In this lab you: - utilized an open-source machine learning toolkit, scikit-learn - implemented linear regression using gradient descent and feature normalization from that toolkit